Let's look at the haplotypes:

\begin{tabular}{|c|c|c|c|} \hline & AB & Ab & bb \\ \hline AB & (AB,AB) & (AB,Ab) & (Ab,Ab) \\ \hline Ab & (AB,aB) & (AB,bb);(Ab,aB) & (AB,aB) \\ \hline bb & (aB,aB) & (aB,bb) & (bb,bb) \\ \hline \end{tabular}

Ambiguity in haplotypes occur whenever any of loci '1,2' is heterozygous or both are heterozygous.

$n_{aB,aB}$ in this case gives rise to twp haplotype pairs: $(bb,AB);(aB,Ab)$

We cannot directly determine the exact count from the genotype information.

In otherwords the haplotype counts $n_{(bb/AB)}$ and $n_{(aB/Ab)}$ are the missing data.

Thus, missing data: $n_{AB/bb}$ and $n_{Ab/Ab}$.

We assume there $N$ individuals and hence there are $2N$ haplotypes.

\textbf{Observed data: }$Y=(n_{ABAB},n_{bbAB},n_{bbbb},n_{ABAb},n_{bbAb},n_{Abbb}, n_{AbAb})$

\textbf{Missing Data}: $n_{AB/bb}$ and $n_{Ab/aB}$

We construct complete data as the haplotype counts:

\textbf{Complete Data:} $n_{AB}, n_{Ab}, n_{aB}, n_{bb}$

Parameters: $\theta= (p_{AB},p_{Ab},p_{aB},p_{bb})$

and hence the Complete data likelihood is given by:

$$ g(n_{AB}, n_{Ab}, n_{aB}, n_{bb} | \theta ) = \frac{2N}{n_{AB}!n_{Ab}!n_{aB}!n_{bb}!} p_{AB}^{n_{AB}}p_{Ab}^{n_{Ab}}p_{aB}^{n_{aB}}p_{bb}^{n_{bb}} $$

In the $E$ step. we perform ($m^{th}$ step): \begin{eqnarray*} \hat{n_{AB}} = E[n_{AB} | Y, \theta_m]\\ \hat{n_{Ab}} = E[n_{Ab} | Y, \theta_m]\\ \hat{n_{aB}} = E[n_{aB} | Y, \theta_m]\\ \hat{n_{bb}} = E[n_{bb} | Y, \theta_m]\\ \end{eqnarray*}

where $\theta_m=(p_{AB}^{(m)}, p_{Ab}^{(m)}, p_{aB}^{(m)}, p_{bb}^{(m)} )$

Just consider $n_{AB}$ for now.

\begin{align*} n_{AB} &= E[n_{AB} | Y, \theta_m]\\ &= 2n_{ABAB} + n_{ABaB} + n_{AbAB} + E[n_{AB/bb}|Y,\theta_m] \end{align*}

where the last term comes because the $AB$ haplotype can also come from the ambiguos we highlighted in the table above $(bb,AB);(aB,Ab)$

Now, we need to consider:

\begin{align*} E[n_{AB/bb}|Y,\theta_m] &= n_{AbAb} P({AB/bb}|{Ab/aB},{AB/bb}) \\ &= n_{AbAb} \times ( \frac{2p_{AB}p_{bb}}{2p_{AB}p_{bb}+2p_{Ab} p_{aB}}) \end{align*}

Where the latter term comes out from the conditional probability of observing $Ab/aB$ haploltype given it is coming from
a heterozygous subpopulation at both A,B

Thus, the $E$ step gives us:

\begin{align*} \hat{n_{AB}} &= 2n_{ABAB} + n_{ABaB} + n_{AbAB} + n_{AbAb} \frac{p_{AB}p_{bb}}{p_{AB}p_{bb}+p_{Ab} p_{aB}}\\ \hat{n_{Ab}} &= 2n_{ABbb} + n_{ABAb} + n_{AbAB} + n_{AbAb} \frac{p_{Ab}p_{aB}}{p_{AB}p_{bb}+p_{Ab} p_{aB}}\\ \hat{n_{aB}} &= 2n_{bbAB} + n_{1AB0} + n_{bbAb} + n_{AbAb}\frac{p_{aB}p_{Ab}}{p_{AB}p_{bb}+p_{Ab} p_{aB}}\\ \hat{n_{bb}} &= 2n_{bbbb} + n_{1Ab0} + n_{bbaB} + n_{AbAb}\frac{p_{bb}p_{AB}}{p_{AB}p_{bb}+p_{Ab} p_{aB}} \end{align*}



In [1]:

    
from __future__ import division
from ipy_table import *

locus1_alleles = ['a', 'A']
locus2_alleles = ['b', 'B']

locus1_genotypes = set([('').join(sorted(x+y)) for x in locus1_alleles for y in  locus1_alleles])
locus2_genotypes = set([('').join(sorted(x+y)) for x in locus2_alleles for y in  locus2_alleles])
locus1_genotypes = sorted(locus1_genotypes)
locus2_genotypes = sorted(locus2_genotypes)



In [2]:

    
phenotypes = sorted(locus2_genotypes)
phenotypes = [['Locus'] + phenotypes]
for i, l1 in enumerate(locus1_genotypes):
    phenotypes.append([locus1_genotypes[i]])
    for j, l2 in enumerate(locus2_genotypes):
        phenotype = list(set([('').join(sorted(x+y)) for x in l1 for y in l2]))
        if len(phenotype)==1:
            element = '{}/{}'.format(phenotype[0], phenotype[0])
        elif len(phenotype)==2:
            element = '{}/{}'.format(phenotype[0], phenotype[1])
        else:
            element = '{}/{}, {}/{}'.format(phenotype[0], phenotype[3], phenotype[1], phenotype[2])
        phenotypes[i+1].append(element)



In [3]:

    
make_table(phenotypes)
apply_theme('basic_both')
set_cell_style(2, 2, color='red')









    Out[3]:




Locus BB Bb bb
AA AB/AB AB/Ab Ab/Ab
Aa AB/Ba AB/Ab,&nbspab/Ba Ab/ab
aa Ba/Ba ab/Ba ab/ab



In [4]:

    
data = [[10,15,5], [10,50,13], [3,13,10]]
total = sum([sum(i) for i in zip(*data)])



In [5]:

    
print 'Genotype table'
make_table(data)









    



Genotype table






    Out[5]:




10 15 5
10 50 13
3 13 10



In [6]:

    
total









    Out[6]:





129



In [7]:

    
# Initialise x
x = 10
# Initialise x
newx = 20

counts = {}

# Initialise genotypes
for i, l1 in enumerate(locus1_genotypes):
    for j, l2 in enumerate(locus2_genotypes):
        counts[l1+l2] = data[i][j]

# Initialise Probability           
p = {'AB':1/6, 'Ab': 1/3, 'aB': 1/3, 'ab': 1/6}



In [8]:

    
print 'Initial Probabilities:'
for key in p.keys():
    print key, p[key]









    



Initial Probabilities:
AB 0.166666666667
ab 0.166666666667
aB 0.333333333333
Ab 0.333333333333



In [9]:

    
while abs(newx-x)>= 1e-6:
    x = newx
    newx = counts['AaBb'] * (p['AB']*p['ab']/(p['AB']*p['ab']+p['Ab']*p['aB']))
    p['AB'] = (2*counts['AABB']+ counts['AABb'] + x)/(2*total)
    p['Ab'] = (counts['AABb']+2*counts['AAbb']+counts['AaBb']-x)/(2*total)
    p['aB'] = (counts['AaBB']+counts['AaBb']-x+2*counts['aaBB']+counts['Aabb'])/(2*total)
    p['ab'] = (x+counts['aaBb']+counts['Aabb']+2*counts['aabb'])/(2*total)



In [10]:

    
x









    Out[10]:





42.762664440703766



In [11]:

    
print 'Final probabilities'
for key in p.keys():
    print key, p[key]









    



Final probabilities
AB 0.301405676127
ab 0.344041335041
aB 0.14045478899
Ab 0.124950913021

Thus the final value of $x$ is: 42 where $x$ is the count of $AB/ab$ phenotypes.

Locus	BB	Bb	bb
AA	AB/AB	AB/Ab	Ab/Ab
Aa	AB/Ba	AB/Ab,&nbspab/Ba	Ab/ab
aa	Ba/Ba	ab/Ba	ab/ab